Biostatistics For Dummies (Monika Wahi John Pezzullo)

In contemplating statistical testing in a fourfold table, consider the process. As described in Chapter

12, you first formulate a null hypothesis (H0) about the fourfold table, set the significance level (such

as α = 0.05), calculate a test statistic, find the corresponding p value, and interpret the result. With a

fourfold table, one obvious test to use is the chi-square test (if necessary assumptions are met). The

chi-square test evaluates whether membership in a particular row is statistically significantly

associated with membership in a particular column. The p value on the chi-square test is the

probability that random fluctuations alone, in the absence of any real effect in the population, could

have produced an observed effect at least as large as what you saw in your sample. If the p value is

less than α (which is 0.05 in your scenario), the effect is said to be statistically significant, and the

null is rejected. Assessing significance using a chi-square test is the most common approach to testing

a cross-tab of any size, including a fourfold table. But fourfold tables can serve as the basis for

developing other metrics besides chi-square tests that can be useful in other ways, which are discussed

in this chapter.

In the rest of this chapter, we describe many useful calculations that you can derive from the

cell counts in a fourfold table. The statistical software that cross-tabulates your raw data can

provide these indices depending upon the commands it has available (see Chapter 4 for a review

of statistical software). Thankfully (and uncharacteristically), unlike in most chapters in this

book, the formulas for many indices derived from fourfold tables are simple enough to do

manually with a calculator (or using Microsoft Excel). All you need are the counts or frequencies

of each of the four cells. For these indices, you can also use a web page for calculation, which is

available here: https://statpages.info/ctab2x2.html. This chapter demonstrates how to

calculate these indices in R (a free, open-source software described in Chapter 4).

Like any other value you calculate from a sample, an index calculated from a fourfold table is a

sample statistic, which is an estimate of the corresponding population parameter. A good researcher

always wants to quote the precision of that estimate. In Chapter 10, we describe how to calculate the

standard error (SE) and confidence interval (CI) for sample statistics such as means and proportions.

Likewise, in this chapter, we show you how to calculate the SE and CI for the various indices you can

derive from a fourfold table.

Though an index itself may be easy to calculate manually, its SE or CI usually is not. Approximate

formulas are available for some of the more common indices. These formulas are usually based on the

fact that the random sampling fluctuations of an index (or its logarithm) are often nearly normally

distributed if the sample size is large enough. We provide approximate formulas for SEs where they’re

available, and demonstrate how to calculate them in R when possible.

For consistency, all the formulas in this chapter refer to the four cell counts of the fourfold

table, and the row totals, column totals, and grand total, in the same standard way (see Figure 13-

1). This convention is used in many online resources and textbooks.